Speaking mode dependent pronunciation modeling in large vocabulary conversational speech recognition
نویسندگان
چکیده
In spontaneous conversational speech there is a large amount of variability due to accents, speaking styles and speaking rates (also known as the speaking mode) [3]. Because current recognition systems usually use only a relatively small number of pronunciation variants for the words in their dictionaries, the amount of variability that can be modeled is limited. Increasing the number of variants per dictionary entry is the obvious solution. Unfortunately, this also means increasing the confusability between the dictionary entries, and thus often leads to an actual performance decrease. In this paper we present a framework for speaking mode dependent pronunciation modeling. The probability of encountering pronunciation variants is de ned to be a function of the speaking style. The probability function is learned through decision trees from rule based generated pronunciation variants as observed on the Switchboard corpus. The framework is successfully applied to increase the performance of our state-of-the-art Janus Recognition Toolkit Switchboard recognizer signi cantly.
منابع مشابه
Modeling Systematic Variations in Pronunciation via a Language-Dependent Hidden Speaking Mode
This paper describes the research efforts of the “Hidden Speaking Mode” group participating in the 1996 summer workshop on speech recognition. The goal of this project is to model pronunciation variations that occur in conversational speech in general and, more specifically, to investigate the use of a hidden speaking mode to represent systematic variations that are correlated with the word seq...
متن کاملRate-of-speech Modeling for Large Vocabulary Conversational Speech Recognition
Variations in rate of speech (ROS) produce changes in both spectral features and word pronunciations that affect automatic speech recognition (ASR) systems. To deal with these ROS effects, we propose to use parallel, rate-specific, acoustic models: one for fast speech, the other for slow speech. Rate switching is permitted at word boundaries, to allow modeling within-sentence speech rate variat...
متن کاملThe Sri March 2000 Hub-5 Conversational Speech Transcription System
We describe SRI’s large vocabulary conversational speech recognition system as used in the March 2000 NIST Hub-5E evaluation. The system performs four recognition passes: (1) bigram recognition with phone-loop-adapted, within-word triphone acoustic models, (2) lattice generation with transcription-mode-adapted models, (3) trigram lattice recognition with adapted cross-word triphone models, and ...
متن کاملRate-dependent Acoustic Modeling for Large Vocabulary Conversational Speech Recognition
Variations in rate of speech (ROS) produce changes in both spectral features and word pronunciations that affect automatic speech recognition (ASR) systems. To deal with these ROS effects, we propose to use parallel, rate-specific, acoustic models: one for fast speech, the other for slow speech. Rate switching is permitted at word boundaries, to allow modeling within-sentence speech rate variat...
متن کاملEnhanced tree clustering with single pronunciation dictionary for conversational speech recognition
Modeling pronunciation variation is key for recognizing conversational speech. Rather than being limited to dictionary modeling, we argue that triphone clustering is an integral part of pronunciation modeling. We propose a new approach called enhanced tree clustering. This approach, in contrast to traditional decision tree based state tying, allows parameter sharing across phonemes. We show tha...
متن کامل